Supplementary material for the paper : ” Adaptive Bandits : Towards the best history - dependent strategy “

نویسندگان

Odalric-Ambrym Maillard

Rémi Munos

چکیده

In this document, we detail further some technical proofs not covered in the paper corresponding to this supplementary material. 1 Playing against an opponent using a known model 1.1 Regret upper bounds against the best history-class-based strategy Theorem 1 In the case of a Φ-constrained opponent, using the Φ-UCB algorithm with parameter α > 1/2, we have the distribution-dependent bound: R T ≤ ∑ c∈H/Φ;E(Ic(T ))>0 ∑ a∈A;∆c(a)>0 4α log(T ) ∆c(a) + ∆c(a)cα where Ic(T ) = ∑T t=1 I[h 0}| is the number of classes that may be activated during the run. Now, in the case of an arbitrary opponent, using ΦExp3 algorithm, we have: R̃ T ≤ 3 √ 2 √ TCA log(A). Proof: Φ-UCB: The distribution-dependent bound for Φ-UCB is a direct application of the result of [2] Appearing in Proceedings of the 14 International Conference on Artificial Intelligence and Statistics (AISTATS) 2011, Fort Lauderdale, FL, USA. Volume 15 of JMLR: W&CP; 15. Copyright 2011 by the authors. for the algorithm UCB about τa(t) def = ∑t s=1 Ias=a where at is played by UCB, that states that E(τa(t)) ≤ 4α log(t) ∆c(a) + cα. Indeed, we use the fact that R Φ T = ∑ c∈H/ΦRT (c) and thus remark that when a class c is visited, then we play according to a UCB algorithm for this class. Thus, for the distribution-free bound, we have: R T = ∑

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Bandits: Towards the best history-dependent strategy

We consider multi-armed bandit games with possibly adaptive opponents. We introduce models Θ of constraints based on equivalence classes on the common history (information shared by the player and the opponent) which define two learning scenarios: (1) The opponent is constrained, i.e. he provides rewards that are stochastic functions of equivalence classes defined by some model θ∗ ∈ Θ. The regr...

متن کامل

Adaptive Control Strategy for a Bilateral Tele- Surgery System Interacting with Active Soft Tissues

In this paper, the problem of control and stabilization of a bilateral tele-surgery roboticsystem in interaction with an active soft tissue is considered. To the best of the authors’ knowledge, theprevious works did not consider a realistic model for a moving soft tissue like heart tissue in beating heartsurgery. Here, a new model is proposed to indicate significant characteristics of a moving ...

متن کامل

The Simulator: Towards a Richer Understanding of Adaptive Sampling in the Moderate-Confidence Regime

In this work, we propose a novel technique for analyzing adaptive sampling called the Simulator. Our approach differs from the existing methods by considering not how much information could be gathered by any fixed sampling strategy, but how difficult it is to distinguish a good sampling strategy from a bad one given the limited amount of data collected up to any given time. This change of pers...

متن کامل

Anytime optimal algorithms in stochastic multi-armed bandits - Supplementary Material

متن کامل

Semi-analytical Solution for Time-dependent Creep Analysis of Rotating Cylinders Made of Anisotropic Exponentially Graded Material (EGM)

In the present paper, time dependent creep behavior of hollow circular rotating cylinders made of exponentially graded material (EGM) is investigated. Loading is composed of an internal pressure, a distributed temperature field due to steady state heat conduction with convective boundary condition and a centrifugal body force. All the material properties are assumed to be exponentially graded a...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Supplementary material for the paper : ” Adaptive Bandits : Towards the best history - dependent strategy “

نویسندگان

چکیده

منابع مشابه

Adaptive Bandits: Towards the best history-dependent strategy

Adaptive Control Strategy for a Bilateral Tele- Surgery System Interacting with Active Soft Tissues

The Simulator: Towards a Richer Understanding of Adaptive Sampling in the Moderate-Confidence Regime

Anytime optimal algorithms in stochastic multi-armed bandits - Supplementary Material

Semi-analytical Solution for Time-dependent Creep Analysis of Rotating Cylinders Made of Anisotropic Exponentially Graded Material (EGM)

عنوان ژورنال:

اشتراک گذاری